Raising the Bar:

Learning and Teaching Better

Data Visualization in R


Presented by Marney Pratt & June Arriens

Contributions by Audrey McCombs & Dan Turner

August 16, 2022


Raise The Bar Interactive Website: https://raisethebar.netlify.app/

email questions to: mcpratt@smith.edu

Why Bar Graphs Are Misleading


Go to Chapter 1 then click on (1) Introduction


https://raisethebar.netlify.app/chapter1

How does the population of Elliptio complanata mussels differ through time and location?

This image shows a map of where mussels were sampled and what an Elliptio complanata mussel looks like

Mussel Density

Table showing the locations (Manhan and Mill), years (2016-2019), number of quadrats sampled (21-56 depending on the year and location), and density of Elliptio complanata mussels in number per meters squared

bar graph of mean shell lengths with standard errors

Let’s answer some questions on the website about mussel size using this bar graph:

https://raisethebar.netlify.app/chapter1

Bar Graphs with Means & Error Bars Can Be Misleading

  • Does not show the distribution, range, or sample size of the actual values
  • Misleads because of common belief that values fall within the bar rather than above
  • You can get the same bars with very different distributions

See Weissgerber et al. 2019 paper from the journal Circulation

Bar graphs do not allow us to answer questions about

  • distribution,
  • sample size, or
  • range

Are there other graph types that can help use answer these questions more effectively?

Resources for Other Useful Graph Types

Note

We use the term “dot plot” here, but a plot that shows all the points for different groups can also be called a beewarm-stye plot, jitter plot, violin scatter plot, column scatter plot, jittered strip plot, jittered individual value plot, among others.

Mussel Length Data Graphed in 4 Ways

This image shows 4 different plot types with the same mussel length data. Bars that show means and standard errors, jittered dot plots with medians that show the distribution of the data, box plots, and violin plots.

Visualizing Grouped Continuous Data

  • Show all points or distribution when possible
  • Distribution of data & sample size determine summary statistics to use
    • Small samples size makes summary stats less reliable
    • Only use mean & SD if normal
    • Box plots should NOT be used for multimodal distribution

What Graph to Choose When You Have a

Categorical Independent Variable

This image shows a table to help choose a good graph type. See the slide notes on the website for a thorough description.

(adapted from Weissgerber et al. 2019 by Marney Pratt, March 3, 2022)

Mix and Match When Needed

This image shows a graph of the mussel lengths that mixes a violin, box, and dot plot.

Why use R?

  • It’s free, open source
  • TONS of resources & flexibility
    • Lots of packages & functions specific to ecology
    • Online R community
    • Cool graphs: R Graph Gallery
  • Reproducible
    • Many journals now require code for analysis to be shared
    • Everything (clean, analyze, plot, communicate data) all in one
  • Highly transferable skill
  • Used by an increasing number of ecologists

Can You Learn R or Teach it to Students with NO Experience?

YES!

Start with code templates to make different graph types

Tip

Manipulating data (“data wrangling”) is much harder than graphing. For people just starting in R, we recommend doing the data wrangling in a spreadsheet program or use already prepared data. Beginners will have a better experience just using R for graphing with template code.

Let’s Make some Graphs!

https://raisethebar.netlify.app/chapter2

Tip

During very first exposures, it can be helpful if the learner doesn’t have to install anything - not R, RStudio, or packages.

The “Raise the Bar” website is one option for exposing novices to how to make some different graphs using R code templates without having to install anything. All you need is a browser and internet access!